Transliterated Search using Syllabification Approach

نویسندگان

Hardik Joshi

Apurva Bhatt

چکیده

Machine transliteration refers to the process of automatic conversion of a word from one language to another without losing its phonological characteristics. In this work, we present our experiments performed in subtask-1 and subtask-2 as a part of the FIRE-2013 transliterated search task. In both the subtasks, the transliteration from Roman script to Devanagari script was performed using syllabification approach that converted English into Hindi language. In the query labeling subtask, identification of English and Hindi words was performed using a hybrid approach that involved morphological analysis of English words and a corpus based approach to identify frequently occurring Hindi words. In the multi-script adhoc retrieval of Hindi song lyrics subtask, the queries were formulated that contained both Roman and Devanagari script and Roman script for separate run submissions. The evaluation of our experiments achieved a higher recall value of query labeling in subtask-1 however the results of subtask-2 are indicating average performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LIGA and Syllabification Approach for Language Identification and Back Transliteration : Shared Task Report by DAIICT

This paper aims to address the solution for the Subtask 1 of Shared Task on transliterated search,a task in FIRE ’14. The task addresses the problem of data containing English words and transliterated words of Indian languages in English.The task calls for language identification and subsequent back transliteration into the native Indian scripts.The system proposed herewith implements Language ...

متن کامل

Constructing Transliteration Lexicons from Web Corpora

This paper proposes a novel approach to automating the construction of transliterated-term lexicons. A simple syllable alignment algorithm is used to construct confusion matrices for cross-language syllable-phoneme conversion. Each row in the confusion matrix consists of a set of syllables in the source language that are (correctly or erroneously) matched phonetically and statistically to a syl...

متن کامل

A Relevance feedback based approach for mixed script transliterated text search: Shared Task report by BIT Mesra, India

This paper describes the experiments carried out as part of the participation in FIRE-2014 Transliterated Search Shared task. We participated in subtask-2 and submitted two results generated by systems based on relevant feedback approach. Given a collection of documents in mixed script, the task is to retrieve relevant documents using queries in either script. The spelling variation between dif...

متن کامل

Encoding transliteration variation through dimensionality reduction: FIRE Shared Task on Transliterated Search

There exist a large amount of user generated Web content in Roman script for the languages which are written in indigenous scripts for various reasons. In the light of this phenomenon, the search engines face a non-trivial problem of matching queries and documents in transliterated space where transliterated content contain extensive spelling variation. This paper describes our proposed method ...

متن کامل

Generating Paired Transliterated-cognates Using Multiple Pronunciation Characteristics from Web corpora

A novel approach to automatically extracting paired transliterated-cognates from Web corpora is proposed in this paper. One of the most important issues addressed is that of taking multiple pronunciation characteristics into account. Terms from various languages may pronounce very differently. Incorporating the knowledge of word origin may improve the pronunciation accuracy of terms. The accura...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Transliterated Search using Syllabification Approach

نویسندگان

چکیده

منابع مشابه

LIGA and Syllabification Approach for Language Identification and Back Transliteration : Shared Task Report by DAIICT

Constructing Transliteration Lexicons from Web Corpora

A Relevance feedback based approach for mixed script transliterated text search: Shared Task report by BIT Mesra, India

Encoding transliteration variation through dimensionality reduction: FIRE Shared Task on Transliterated Search

Generating Paired Transliterated-cognates Using Multiple Pronunciation Characteristics from Web corpora

عنوان ژورنال:

اشتراک گذاری